Goto

Collaborating Authors

 construction task


Collaborating Action by Action: A Multi-agent LLM Framework for Embodied Reasoning

White, Isadora, Nottingham, Kolby, Maniar, Ayush, Robinson, Max, Lillemark, Hansen, Maheshwari, Mehul, Qin, Lianhui, Ammanabrolu, Prithviraj

arXiv.org Artificial Intelligence

Collaboration is ubiquitous and essential in day-to-day life -- from exchanging ideas, to delegating tasks, to generating plans together. This work studies how LLMs can adaptively collaborate to perform complex embodied reasoning tasks. To this end we introduce MINDcraft, an easily extensible platform built to enable LLM agents to control characters in the open-world game of Minecraft; and MineCollab, a benchmark to test the different dimensions of embodied and collaborative reasoning. An experimental study finds that the primary bottleneck in collaborating effectively for current state-of-the-art agents is efficient natural language communication, with agent performance dropping as much as 15% when they are required to communicate detailed task completion plans. We conclude that existing LLM agents are ill-optimized for multi-agent collaboration, especially in embodied scenarios, and highlight the need to employ methods beyond in-context and imitation learning. Our website can be found here: https://mindcraft-minecollab.github.io/


APT: Architectural Planning and Text-to-Blueprint Construction Using Large Language Models for Open-World Agents

Chen, Jun Yu, Gao, Tao

arXiv.org Artificial Intelligence

We present APT, an advanced Large Language Model (LLM)-driven framework that enables autonomous agents to construct complex and creative structures within the Minecraft environment. Unlike previous approaches that primarily concentrate on skill-based open-world tasks or rely on image-based diffusion models for generating voxel-based structures, our method leverages the intrinsic spatial reasoning capabilities of LLMs. By employing chain-of-thought decomposition along with multimodal inputs, the framework generates detailed architectural layouts and blueprints that the agent can execute under zero-shot or few-shot learning scenarios. Our agent incorporates both memory and reflection modules to facilitate lifelong learning, adaptive refinement, and error correction throughout the building process. To rigorously evaluate the agent's performance in this emerging research area, we introduce a comprehensive benchmark consisting of diverse construction tasks designed to test creativity, spatial reasoning, adherence to in-game rules, and the effective integration of multimodal instructions. Experimental results using various GPT-based LLM backends and agent configurations demonstrate the agent's capacity to accurately interpret extensive instructions involving numerous items, their positions, and orientations. The agent successfully produces complex structures complete with internal functionalities such as Redstone-powered systems. A/B testing indicates that the inclusion of a memory module leads to a significant increase in performance, emphasizing its role in enabling continuous learning and the reuse of accumulated experience. Additionally, the agent's unexpected emergence of scaffolding behavior highlights the potential of future LLM-driven agents to utilize subroutine planning and leverage the emergence ability of LLMs to autonomously develop human-like problem-solving techniques.


MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs

Yu, Xianhao, Fu, Jiaqi, Deng, Renjia, Han, Wenjuan

arXiv.org Artificial Intelligence

While Vision-Language Models (VLMs) hold promise for tasks requiring extensive collaboration, traditional multi-agent simulators have facilitated rich explorations of an interactive artificial society that reflects collective behavior. However, these existing simulators face significant limitations. Firstly, they struggle with handling large numbers of agents due to high resource demands. Secondly, they often assume agents possess perfect information and limitless capabilities, hindering the ecological validity of simulated social interactions. To bridge this gap, we propose a multi-agent Minecraft simulator, MineLand, that bridges this gap by introducing three key features: large-scale scalability, limited multimodal senses, and physical needs. Our simulator supports 64 or more agents. Agents have limited visual, auditory, and environmental awareness, forcing them to actively communicate and collaborate to fulfill physical needs like food and resources. Additionally, we further introduce an AI agent framework, Alex, inspired by multitasking theory, enabling agents to handle intricate coordination and scheduling. Our experiments demonstrate that the simulator, the corresponding benchmark, and the AI agent framework contribute to more ecological and nuanced collective behavior.The source code of MineLand and Alex is openly available at https://github.com/cocacola-lab/MineLand.


Integration of 4D BIM and Robot Task Planning: Creation and Flow of Construction-Related Information for Action-Level Simulation of Indoor Wall Frame Installation

Oyediran, Hafiz, Turner, William, Kim, Kyungki, Barrows, Matthew

arXiv.org Artificial Intelligence

An obstacle toward construction robotization is the lack of methods to plan robot operations within the entire construction planning process. Despite the strength in modeling construction site conditions, 4D BIM technologies cannot perform construction robot task planning considering the contexts of given work environments. To address this limitation, this study presents a framework that integrates 4D BIM and robot task planning, presents an information flow for the integration, and performs high-level robot task planning and detailed simulation. The framework uniquely incorporates a construction robot knowledge base that derives robotrelated modeling requirements to augment a 4D BIM model. Then, the 4D BIM model is converted into a robot simulation world where a robot performs a sequence of actions retrieving construction-related information. A case study focusing on the interior wall frame installation demonstrates the potential of systematic integration in achieving context-aware robot task planning and simulation in construction environments. Simulated a mobile robot's actions to install wall frames in a residential building 1. Introduction Rapid advancements in robotics technologies are making the utilization of robots for dangerous, tedious, and repetitive tasks more and more practical [1]. Unlike traditional industrial robots with fixed behaviors, modern robots with mobile platforms, sensors, and actuators can be programmed to perform given tasks intelligently adapting to changing work environments. Many sectors, including manufacturing [2], rescue [3], agriculture [4], and healthcare [5], are adopting robots to automate existing processes to achieve greater productivity and safety. Many construction tasks are repetitive and labor-intensive by nature [7,8], and thus robotization of these tasks can potentially address many chronic problems, such as stagnant productivity growth [9], labor shortage [10], and work-related diseases/fatalities [11]. A growing number of robotic solutions are introduced by academic studies [12,13] and industrial applications (excavation and leveling [14], marking of layout [15], rebar tying [16], and bricklaying [17,18]). With this trend, construction sites are expected to become crowded with robots and human workers in the near future exposing human workers to robot-related hazards, such as collisions, crushing, trapping, mechanical part accidents, etc. [19]. In order to utilize robots safely and effectively in congested construction environments, both high-level task planning and detailed simulation of construction robots should be performed as part of the entire construction planning. Despite the abundant studies on the coordination between human work crews [20,21], none of the prior studies incorporated robot operations into construction planning process.


Enabling BIM-Driven Robotic Construction Workflows with Closed-Loop Digital Twins

Wang, Xi, Yu, Hongrui, McGee, Wes, Menassa, Carol C., Kamat, Vineet R.

arXiv.org Artificial Intelligence

The introduction of assistive construction robots can significantly alleviate physical demands on construction workers. Leveraging a Building Information Model (BIM) offers a natural and promising approach to driving a robotic construction workflow. However, because of uncertainties inherent in construction sites, such as discrepancies between the as-designed and as-built components, robots cannot solely rely on a BIM to plan and perform field construction work. Human workers are adept at improvising alternative plans with their creativity and experience and thus can assist robots in overcoming uncertainties and performing construction work successfully. In such scenarios, it is critical to continuously update the BIM as work processes unfold so that it includes as-built information for the ensuing construction and maintenance tasks. This research introduces an interactive closed-loop digital twin framework that integrates a BIM into human-robot collaborative construction workflows. The robot's functions are primarily driven by the BIM, but it adaptively adjusts its plans based on actual site conditions, while the human co-worker oversees and supervises the process. When necessary, the human co-worker intervenes in the robot's plan by changing the task sequence or workspace geometry or requesting a new motion plan to help the robot overcome the encountered uncertainties. Experiments involving block pick-and-place tasks are carried out to verify system performance using an industrial robotic arm in a research laboratory setting that mimics a construction site. In addition, a drywall installation case study is conducted to validate the system. Integrating the flexibility of human workers and the autonomy and accuracy afforded by BIMs, the proposed framework offers significant promise of increasing the robustness of construction robots in the performance of field construction work.


Habits of Mind: Reusing Action Sequences for Efficient Planning

Éltető, Noémi, Dayan, Peter

arXiv.org Artificial Intelligence

When we exercise sequences of actions, their execution becomes more fluent and precise. Here, we consider the possibility that exercised action sequences can also be used to make planning faster and more accurate by focusing expansion of the search tree on paths that have been frequently used in the past, and by reducing deep planning problems to shallow ones via multi-step jumps in the tree. To capture such sequences, we use a flexible Bayesian action chunking mechanism which finds and exploits statistically reliable structure at different scales. This gives rise to shorter or longer routines that can be embedded into a Monte-Carlo tree search planner. We show the benefits of this scheme using a physical construction task patterned after tangrams.


MARC: A multi-agent robots control framework for enhancing reinforcement learning in construction tasks

Duan, Kangkang, Suen, Christine Wun Ki, Zou, Zhengbo

arXiv.org Artificial Intelligence

Letting robots emulate human behavior has always posed a challenge, particularly in scenarios involving multiple robots. In this paper, we presented a framework aimed at achieving multi-agent reinforcement learning for robot control in construction tasks. The construction industry often necessitates complex interactions and coordination among multiple robots, demanding a solution that enables effective collaboration and efficient task execution. Our proposed framework leverages the principles of proximal policy optimization and developed a multi-agent version to enable the robots to acquire sophisticated control policies. We evaluated the effectiveness of our framework by learning four different collaborative tasks in the construction environments. The results demonstrated the capability of our approach in enabling multiple robots to learn and adapt their behaviors in complex construction tasks while effectively preventing collisions. Results also revealed the potential of combining and exploring the advantages of reinforcement learning algorithms and inverse kinematics. The findings from this research contributed to the advancement of multi-agent reinforcement learning in the domain of construction robotics. By enabling robots to behave like human counterparts and collaborate effectively, we pave the way for more efficient, flexible, and intelligent construction processes.


Learning from demonstrations: An intuitive VR environment for imitation learning of construction robots

Duan, Kangkang, Zou, Zhengbo

arXiv.org Artificial Intelligence

Construction robots are challenging the traditional paradigm of labor intensive and repetitive construction tasks. Present concerns regarding construction robots are focused on their abilities in performing complex tasks consisting of several subtasks and their adaptability to work in unstructured and dynamic construction environments. Imitation learning (IL) has shown advantages in training a robot to imitate expert actions in complex tasks and the policy thereafter generated by reinforcement learning (RL) is more adaptive in comparison with pre-programmed robots. In this paper, we proposed a framework composed of two modules for imitation learning of construction robots. The first module provides an intuitive expert demonstration collection Virtual Reality (VR) platform where a robot will automatically follow the position, rotation, and actions of the expert's hand in real-time, instead of requiring an expert to control the robot via controllers. The second module provides a template for imitation learning using observations and actions recorded in the first module. In the second module, Behavior Cloning (BC) is utilized for pre-training, Generative Adversarial Imitation Learning (GAIL) and Proximal Policy Optimization (PPO) are combined to achieve a trade-off between the strength of imitation vs. exploration. Results show that imitation learning, especially when combined with PPO, could significantly accelerate training in limited training steps and improve policy performance.


Robot-Enabled Construction Assembly with Automated Sequence Planning based on ChatGPT: RoboGPT

You, Hengxu, Ye, Yang, Zhou, Tianyu, Zhu, Qi, Du, Jing

arXiv.org Artificial Intelligence

Robot-based assembly in construction has emerged as a promising solution to address numerous challenges such as increasing costs, labor shortages, and the demand for safe and efficient construction processes. One of the main obstacles in realizing the full potential of these robotic systems is the need for effective and efficient sequence planning for construction tasks. Current approaches, including mathematical and heuristic techniques or machine learning methods, face limitations in their adaptability and scalability to dynamic construction environments. To expand the ability of the current robot system in sequential understanding, this paper introduces RoboGPT, a novel system that leverages the advanced reasoning capabilities of ChatGPT, a large language model, for automated sequence planning in robot-based assembly applied to construction tasks. The proposed system adapts ChatGPT for construction sequence planning and demonstrate its feasibility and effectiveness through experimental evaluation including Two case studies and 80 trials about real construction tasks. The results show that RoboGPT-driven robots can handle complex construction operations and adapt to changes on the fly. This paper contributes to the ongoing efforts to enhance the capabilities and performance of robot-based assembly systems in the construction industry, and it paves the way for further integration of large language model technologies in the field of construction robotics.


Optimizing robotic swarm based construction tasks

Liyanage, Teshan, Fernando, Subha

arXiv.org Artificial Intelligence

Social insects in nature such as ants, termites and bees construct their colonies collaboratively in a very efficient process. In these swarms, each insect contributes to the construction task individually showing redundant and parallel behavior of individual entities. But the robotics adaptations of these swarm's behaviors haven't yet made it to the real world at a large enough scale of commonly being used due to the limitations in the existing approaches to the swarm robotics construction. This paper presents an approach that combines the existing swarm construction approaches which results in a swarm robotic system, capable of constructing a given 2 dimensional shape in an optimized manner.